A confidential case study · internal RAG agent · entitlement-aware
A senior-analyst-grade research agent — embedded in Slack and the firm's intranet — that answers policy, precedent, and process questions from the firm's own internal corpus, without exposing a single document to a public model.
Deal memos, IC decks, policies, and playbooks scattered across SharePoint, Notion, and shared drives. Every senior analyst's inbox was the search engine — and the bottleneck.
Deal memos, IC decks, policies, and playbooks scattered across SharePoint, Notion, and shared drives. Findable only if you already knew where to look.
The same precedent and policy questions cycled back to the same three or four people, every week — pulling them out of the work they were hired to do.
Nothing could leave the firm's environment. Public APIs were a non-starter. Whatever the answer was, it had to live entirely inside the VPC.
New hires spent weeks learning where knowledge lived before they could actually use it. Day-one productivity was a fantasy.
Envyro partnered with the firm to design and deploy a single-tenant, entitlement-aware RAG agent — indexing 200,000+ internal documents and serving every team through Slack and the intranet.
Every answer carries citations back to the source document and page. Every retrieval respects the user's actual access rights. Nothing the user shouldn't see ever enters the prompt.
Built by Envyro · Running inside the firm's VPC.
BM25 + dense vectors over deal memos, IC decks, policies, and playbooks — twelve years of institutional knowledge made searchable.
Every query is filtered through the user's access rights before retrieval. The model literally cannot reason over a document the user isn't entitled to see.
Zero onboarding, no new app to learn. The agent lives where the work already happens — DM it, mention it, ask it inline.
Every claim links back to its source document and page. No source means no answer — and the user can verify in one click.
What used to mean DM'ing a partner and waiting until tomorrow now takes four seconds — and the answer comes with citations.
A single platform deployed inside the firm's VPC, rolled out team-by-team over six weeks. No public APIs, no third-party model exposure, no shared tenancy.
Source connectors wired into SharePoint, Notion, and shared drives. The firm's access model mirrored exactly — no shortcuts.
Hybrid retrieval indexed. Eval suite built against real internal questions. Citation behavior tuned until it stopped guessing.
Rolled out team-by-team. Every query, retrieval, and feedback signal logged for tuning. The system gets sharper every week.
A representative month — roughly 7,200 queries across the firm, the vast majority answered cleanly with citations. The remainder routed to the right human SME with full context attached.
When the model isn't confident in a citation, it doesn't guess — it asks. The remaining 8% land in front of the right human SME with the question and partial context already attached.
Hybrid-retrieved, entitlement-filtered, and cited back to source — answer delivered in the surface the user asked from, in under five seconds.
Surfaced to the SME with the original question, partial retrieval, and the model's hesitation reason — so the human picks up exactly where the agent stopped.
When the model isn't confident in a citation, it doesn't guess — it asks. That single decision is what makes the system safe to run firm-wide.
A single pipeline carries every query through five stages — entitlements, retrieval, generation, citation, and feedback — in under five seconds, with full traceability at every step.
Analyst pings the agent in Slack or the intranet widget. No new tool, no context-switch tax.
User's access rights loaded in real time. Retrieval scope is narrowed before search even runs.
BM25 + dense search over the entitled subset of the corpus. Best of lexical and semantic, on the right slice.
LLM answers only from retrieved sources. No source means no answer — the system would rather say it doesn't know.
Answer returned with linked citations. Thumbs and corrections logged for ongoing tuning.
What a single research question used to mean for the firm, versus what it means now. The work shape is the same; the time-to-answer collapsed.
Senior analyst hours come back. New hires get usable on day one. Knowledge stops leaving with people who leave. And nothing crosses the firm's boundary, ever.
Returned to investment work, not file-hunting. Across the analyst bench, that's measurable IC throughput.
Day-one access to firm precedent and policy — without having to know which partner to ask first.
Knowledge stays in the system, not in inboxes. When someone leaves, what they knew doesn't leave with them.
Single-tenant, in-VPC, audit-logged. The agent runs where the data already lives — no exceptions.
Envyro is a specialized AI agency designing, deploying, and maintaining custom AI agents and pipelines that work in production. We stay on the call as your systems evolve.
Shop management platform · AI email pipeline embedded into the stack.
Office equipment & service · bilingual voice AI for inbound calls.
350K+ residents · 24/7 GenAI resident support across municipal services.
$1.6B NYSE-listed REIT · resident-services AI across the portfolio.
Let's talk
Tell us where the knowledge sits. We'll show you what a production-grade RAG agent inside your perimeter looks like — and what the next two weeks could return.